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Abstract 

Time-of-flight measurements such as the OPERA and MINOS exper- 
iments rely crucially on statistical analysis (as well as many other ingre- 
dients) for their conclusions. The nature of these experiments leads to 
a simple class of statistical models for the results; however, which model 
in the class is appropriate is not known exactly, as this depends on in- 
formation obtained experimentally, which is subject to noise and other 
errors. To obtain robust conclusions, this problem, known as "model un- 
certainty," needs to be addressed, with quantitative bounds on the effect 
such uncertainty may have on the final result. 

The OPERA (and MINOS) analysis appears to take steps to mitigate 
the effects of model uncertainty, though without quantifying any remain- 
ing effect. We describe one of the strategies used (averaging individual 
probability distributions), and point out a potential source of error if this 
is not carried out correctly. We then argue that the correct version of 
this strategy is not the most effective, and suggest possible alternatives. 
These alternatives may give more accurate statistical results using the 
same data, allowing, for example, more accurate determination of the de- 
pendence of the anomalous time shift on energy. Which strategies work 
and how well can only be evaluated with access to the full data. 

Whether or not the anomalous result from OPERA turns out to be 
confirmed, we believe that techniques such as those presented here may 
be appropriate for the analysis of other timing experiments of this type. 



1 Introduction 

The fundamental statistical question arising from the OPERA timing experi- 
ment has the following form: the observations (neutrino arrival times) can be 
seen as independent samples Tj from a family rij(t) of probability distributions 
shifted by a common unknown offset, and the task is to estimate this offset. Un- 
fortunately, the rii(t) are not known exactly; rather some proxy measurement 
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Pi(t) is available, and the statistics must take this into account. There may also 
be small errors in the measurement of the Tj that are not independent. 

In the specific case of the OPERA experiment, Pi(t), the i proton wave- 
form, is the output of the waveform digitizer connected to the BCT (Beam 
Current Transformer) for the i th proton pulse (or "spill"). Following [T], we 
write St for the (unknown) difference TOF c - TOFJ3 between the actual time- 
of- flight and that expected (distance divided by c). Let 

At TOF c -\- ^ixmd ^pmd 

denote the expected time delay between the proton waveforms and neutrino 
detection events in the case that neutrinos move at the speed of light, where 
tvmd and i pm d incorporate all known neutrino and proton measurement delays 
respectively. Then we may define rii{t) by saying that detection events for the 
interval [t + At — St, t + At — St + dt) correspond to a Poisson process with rate 
rii(t)dt. The notation is set so that Pi(t) and rii(t) have the same time origin 
(and would be proportional under idealized conditions), but rii(t) is defined in 
terms of detection events. This gives us the notational freedom to hypothesize 
any relationship between Pi(t) and rii(t). In this paper we shall indicate how 
one may cope with certain unknown differences between them. 

If the chance of a neutrino giving rise to a detection event is independent 
of its position in the 10.5/iS pulse, then rii(t) can be thought of as the neutrino 
waveform. This is a fairly mild assumption, but even so requires justification 
since, for example, it could be imagined that for some reason the neutrinos 
towards the end of the 10.5/xs pulse tend to be of slightly higher or lower en- 
ergy than those at the start, and that the detector has different sensitivities to 
neutrinos of different energies. 

In the idealized case that the BCT measures the proton current exactly, that 
each proton has the same chance of ultimately leading to a neutrino detection, 
and that the relevant clocks are perfectly synchronized, then rii(t) and Pi(t) are 
equal. However, they are clearly not exactly equal for a number of reasons; see, 
for example, the discussion of oscillations on page 125 of [3]. Since only pi{t) 
can be measured, but naive statistical analysis requires knowledge of rii(t), a 
problem called "model uncertainty" arises. To obtain reliable statistical results, 
it is necessary to quantify the extent of this uncertainty (the difference between 
the Pi(t) and the iii(t)), or at least to argue explicitly that it is negligible; it is 
also necessary to quantify how sensitive the output of the statistical estimation 
procedure is to changes in the "input" rii(t). 

More explicitly, given rn(t), the task of estimating St is a simple closed 
mathematical problem. We can think of the functions rii{t) as leading to a one- 
parameter family (parameterized by St) of models of when we expect neutrinos 
to be detected, and then standard techniques such as maximum likelihood es- 
timation will work well. But in practice what we actually know is Pi{t), so we 
don't know which model rii{t) to use. As far as possible we'd like to model the 
model uncertainty statistically and just make a bigger model, but some types 

1 We follow the sign convention for 6t in pQ. 
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of uncertainty may be hard to model this way, for example because they might 
be non-independent for different i. 

In the arXiv preprint pQ posted by the OPERA collaboration it is implicitly 
assumed that any failure of rii (t) to be proportional to pi (t) is accounted for in 
the systematic errors. However, it is possible that discrepancies between Pi(t) 
and iii(t) could reinforce each other in such a way as to create a much greater 
discrepancy in the final estimate of St. Similarly, the estimate of St could be 
particularly sensitive to small discrepancies in particular parts of the waveform; 
a qualitative discussion of a potential effect of this type is given in pi] . Therefore 
we suggest that the statistical analysis should be conducted so as to be as robust 
as possible against such discrepancies of rii(t) from Pi(t), and we give examples 
of how this might be done for certain classes of discrepancies. 

One strategy adopted in [1] for (it appears) precisely the purpose of reduc- 
ing the effect of noise in the measured current Pi(t) involves summing the Pi(t) 
to make an aggregate proton waveform w(t). [For simplicity of notation, and 
because it is not relevant to the discussion here, we omit the distinction between 
the first and second extractions; when we speak of summing all proton wave- 
forms, this can be understood to mean all waveforms within the first (or second) 
extraction.] We believe that this process is potentially flawed, for a reason given 
below, and if not flawed, then it appears not to be making best use of the infor- 
mation available. If the Pi(t) are treated separately for each i, then it is likely 
that a more accurate result can be obtained, which would, for example, enable 
one to probe the energy dependence of St with greater precision. Whether this 
is in fact possible depends on details of the waveforms Pi(t), which do not seem 
to be publicly available at the time of writing, as well as on properties of the 
discrepancies between the rn{t) and the Pi(t). 

2 Summing proton waveforms 

In Section 7 of pQ and Section 8.1 of 3 , it is stated that the aggregate dis- 
tribution (PDF) w(t) is constructed by adding pi(t) over i corresponding to 
neutrino detections, then normalizing the result. This procedure is statistically 
flawed for reasons given below, and if it had been used could very easily ex- 
plain the anomalous result St > 0. We raised this issue with one of the authors 
of pQ (Dario Autiero), who informed us in a private communication [S] that the 
Pi{t) were in fact first normalized before being added together. This avoids the 
above-mentioned flaw, but we believe that it may still be useful to describe this 
effect, because the references [TJI3] are not clear (or if anything, clear the wrong 
way!) on this important point. 

Treating (for the moment) Pi(t) and rii(t) as interchangeable, the problem 
with adding up the Pi(t) is that it may be that ai — J pi(i)d£ depends on i, 
i.e., that some pulses contain more protons that others. Only those pi(t) leading 
to a (single, we assume) neutrino detection event are selected. Conditioned on 
Pi(t) giving rise to a single neutrino event, the time of this event (or rather, of 
the proton causing the event) is simply a random variable with density given 
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by the normalized version %{t) of Pi(t). Hence the statistically proper thing to 
do would be to first normalize each Pi(t) for each i corresponding to a neutrino 
event, and then add up the resulting Pi(t), and then normalize again (i.e., divide 
by the total number of detections) to get the aggregate distribution w(t). 

Summing the Pi(t) and then normalizing corresponds to taking a weighted 
average of the correct distributions Pi{t), with each weighted by a^. This is 
known as 'size-biasing'. If there is a correlation between and the shape of 
the waveform pi (t) (which seems very plausible given that the individual pt (t) 
seem to differ significantly), then the size-biased average w(t) may differ from 
w(t). It is quite possible that these averages may have a very similar shape, 
but with a small time offset. This could lead to an error in the estimate of St 
that would not be picked up by the tests described in [I], such as the Monte 
Carlo simulation, assuming these tests also used w(t) rather than uu(t) or the 
individual Pi(t). 

Adding up the Pi(t) before normalizing would make sense if all Pi(t) were 
included, not just those corresponding to neutrino detection events (or, at least, 
the subset of the pi (t) to be added together is chosen without regard to whether 
or not a neutrino was detected). This would construct a different unbiased es- 
timator of the aggregate distribution. It may be counterintuitive that simply 
ignoring waveforms that did not lead to a detection can introduce an error. 
However, taking this point of view, selecting events leading to a detection in- 
troduces size-biasing, which must then be reversed by normalizing. We believe 
that summing all pi{t) is unlikely to be the best way to proceed statistically, 
since it throws away information about which neutrino events occurred, so from 
now on we assume that i indexes only those pulses for which a neutrino was 
detected. 

3 Using individual proton waveforms 

From now on we assume that all individual waveforms pi (t) and rii (t) are normal- 
ized. We ignore the known time offset At, assuming that the data Ti now repre- 
sent detection times from which At has been subtracted. Later, we shall adopt 
the following model: independent observations iVi, N2, ■ ■ ■ , iVjv are drawn from 
ni(t), ri2 (t), ■ ■ ■ , njv(i)- The measured arrival times Ti, . . . , Tjv are then given 
by Ti = Ni — 5t + Jj, where St is the parameter we wish to measure and the 
Ji represent small noise in the form of "jitter" . The actual observed values are 
ti,£2) • • • j tN, as in [TJ Section 7], where N = 16111, but we use Ti, T 2 , . . . , T/v 
when we want to think of these observations as random variables. 

For the moment, assume that Ji = and Pi{t) — rij(t). It is then a purely 
mathematical problem to estimate St, since we have (temporarily) removed all 
sources of errors in this description. A good method to use here is maximum 
likelihood estimation: if S is a candidate value for the unknown shift, then 
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writing T for the collection (T\, Ta, . . . , Tjy), the log likelihood is 

N 

L(S;T)=Y,log(p i (T i + S)) (1) 

i=l 

and the maximum likelihood estimator (MLE) of St is 6t(T), the value of 5 that 
maximizes L(<5;T). 

Intuitively, most of the statistical benefit of a good estimator for 5t comes 
from the parts of the input waveforms, Pi{t), that are varying the fastest. We 
can quantify this by taking expectations over Ti to look at typical values of 
-L(<5;T): the expectation is 

/oo 
Pl (t + St)log( Pl {t + S))dt. 
-oo 

Differentiating with respect to S and substituting t for t + St in the integral, 
we see that 

EL'(St)=J2 f°° Mt)^-dt = 0, (2) 

i J -oo PiV-) 

which means that the MLE will be asymptotically unbiased. — EL" (St) (the 
Fisher information) is a measure of how responsive the likelihood is to changes 
in 5, and so quantifies the information available. It can be identified with the 
expected second derivative of the graphs in [TJ Fig. 8] at the true value 5t (which 
may be slightly different from the second derivative at the maximum of these 
graphs). 

The MLE asymptotically achieves this Fisher information, which has an 
expression independent of the unknown dt: 

EL"(St) = 5>(p,()) =J2 r ^T dt ' ( 3 ) 

(More precisely, assuming the integral converges, then if the distribution or 
family of distributions is fixed and the number of samples tends to infinity, then 
the standard deviation of St will scale with (— EL" (St))^ 1 / 2 .) 

As expected, the more wiggly the waveforms are, the better, especially if the 
wiggle is where Pi(t) is small. In practice, there are limits to how much use can 
be made of the regions where Pi(t) is small, due to small uncertainties in the 
value of Pi(t), and to the limited number of samples available; the latter effect 
means that the asymptotic regime is not necessarily attained. 

In contrast, treating the samples as independent samples from a single ag- 
gregate distribution w(t) — N~ 1 ^2f =1 pi(t), our estimator is the maximum of 
the log likelihood in (flj with wQ in place of PiQ- The information available by 
this method is NX(wQ), and it is easy to check that NX(wQ) ^ J2iLiZ(Pi(j), 
with equality if and only if all the Piit) are equal. Thus averaging the Pi(t) will 
result in a less sensitive (higher variance) estimator. 
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The interpretation of this is that, as might be expected, the procedure de- 
scribed in [Tj of averaging together all the proton distributions will wash away 
information that is present in the individual distributions. The amount of in- 
formation lost will depend on how different the individual distributions are, but 
comparing the graphs in Figures 4 and 9 of pQ it seems that these individual 
distributions are indeed very different, and much more wiggly than the aggre- 
gate distribution. This suggests that an estimator of St based on individual 
distributions could be considerably more accurate than the existing estimator. 
(Even if it is not possible to effectively take advantage of the fine structure of 
the Pi, it should be possible to benefit from the 5-peak structure much more 
strongly present in Figure 4 than in Figure 9.) 

The above argument assumed that rii(t) — Pi(t) and Ji = (i.e., Ti = 
Ni — St): to be correct it should have been written in terms of Ni — St and 
rii(i), not Ti and Pi(i). Since Ji and Uiit) are unknown, we may be tempted to 
forget about these differences, but unfortunately the estimator St(T) so obtained 
might be quite susceptible to small discrepancies between Pi(i) and n,i(t), or to 
jitter Ji. For example, for a particular i, we might have Pi(t) = for t near to 
ti + St, which would cause it to be impossible to estimate St near its true value, 
regardless of any amount of other data. 

According to a remark in a private communication [5], this kind of effect 
(specifically "small white noise (electronics) on the baseline") in each Pi(t) is 
the reason why individual Pi(i) were not used in [T]. However, we believe that 
it should be possible to construct estimators that are robust against such ef- 
fects, and therefore enable us to tap into the large amount of extra information 
available from the individual Pi(t). This is the subject of the next section. 

Note that averaging the pi (t) does not completely eliminate noise; it merely 
reduces its variance. When using statistics to extract a very delicate result 
(order of 10ns accuracy) from a wide spread (approx 10/xs), the small residual 
discrepancy between ni(t) and 53 mav ve t be important. In other words, 
whatever method is chosen (averaging pi(t) or not), it is still necessary to have a 
robust statistical appraisal of the effects of possible discrepancies between pi (i) 
and rii(t). 

It seems then that a proper statement of a statistical analysis of this type 
should always be preceded by a caveat of the form "Assuming that Pi(t) and 
rii{i) can differ in the following sorts of ways, then..." . This is the limit of what 
can be said without further experimental data, since if we have no bound at all 
on how different Pi(t) and riiii) can be, then obviously no useful information 
can be extracted from any statistics. 

4 Constructing robust estimators 
4.1 General procedure 

In accordance with the above prescription, we first decide on the class of dis- 
crepancies between the actual distribution of T = (Ti, . . . , T/v) and its idealized 
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version that we shall work with. Then we choose an estimator, and finally we 
measure the bias and efficiency of the estimator by finding lower bounds on the 
probabilities of confidence intervals centred on the estimator. 
We shall assume that 

P%(t) = m(t) +Ei(t), 

where 

/oo 
\£i(t)\dt < 2e, (4) 
-oo 

corresponding to a total variation distance of at most e. We write pi ~ rii 
whenever this holds, and p«nto indicate that it holds for every i. 

We assume that N = (Ni, . . . ,Nn) consists of independent samples from 
the probability densities nx(t), . . . , n_/v(i), and that 

Ti = Ni - St + J t 

where 

\Ji\ < J (5) 

for each i. 

We think of £i(t) as representing "noise" in the measurement p$ of Hi, and 
the "jitter" Ji as including errors in the measurement of the arrival times, and so 
on. There is some redundancy here, since certain types of difference between the 
distributions Pi and can also be seen as jitter. Note that these discrepancies 
are worse than statistical (independent random) or systematic (random, but 
constant over i) error, since they are assumed to be chosen non-randomly — 
"maliciously" — in such a way as to change the estimator from the true value by 
as much as possible. The motivation for this is that we may not be sure which 
errors are independent, and we wish to avoid having to describe all possible ways 
in which they might be correlated. For example, we should like our estimator 
to be robust against small timing errors that are somehow correlated with the 
time offset of the proton within the pulse, or timing errors that result from very 
slight unknown changes in distance and hence TOF c over the months and years. 

If there were no jitter, then the 200MHz signal present in each Pi(t) ([TJ 
pp. 5-6]) would be extremely useful in pinning down a good estimate of St. 
But to be on the safe side we presume that there is jitter of the order of at 
least 2.5ns, which means that the 200MHz information is illusory, and in fact 
we must be careful not to try to make use of it, since +/- 2.5ns jitter could 
cause disproportionately large changes in the likelihood by means of aligning, 
or anti-aligning, the peaks of the 200MHz signal in pi(t) with the data. This 
argument illustrates the kind of effect we need to be robust against. 

If we knew the distribution of T, then we could define any estimator St(T), 
and establish statistical bounds on \St(T) — St\ (i.e., a confidence interval) simply 
by repeatedly generating samples from the (vector valued) distribution T and 
evaluating the difference |#i(T) — St\. (Since St is unknown, in general one has 
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to consider all possible values of St, but as we shall see, this is not necessary 
here.) Of course, we do not know this distribution exactly. 

Our assumptions imply that we can write Tj = N% — St + Ji, where the iVj 
are independent with distributions (t) satisfying n, ~ Pi- Note that we make 
no independence assumptions concerning the Ji or Tj. If we can show that 

Vn rj p : P (V| Ji\ < J : |<fi(T) - £t| < iu) > s (6) 

then the estimator St(T) can be said to be within w of <5f with a confidence 
of s, whatever the unknown rij(t) and areH For this to be useful we need 
two properties: (i) that we haven't given away too much, i.e., that values of 
w and s satisfying (|6]) give reasonably small error bounds, and (ii) there is an 
effective way to show that © is satisfied for given w and s, since, for example, 
simply exhausting over the space [—J,J] N of all possible jitters is computa- 
tionally impractical. Fortunately, it is possible to satisfy both requirements 
simultaneously. 

4.2 Choice of estimator 

The estimators we shall consider here can be thought of as modified maximum 
likelihood estimators: we choose a modified log likelihood, fi(t), then, analo- 
gously to (TT]), define St(T) to be the value of S that maximizes the modified 
likelihood: 

L(S;T) = J2fi(Ti + S), 

(7) 

5t(T) = argmax(L(£;T)). 
s 

Although the method we describe is mathematically rigorous for any choice 
of fi(t), the results (width of the final interval) will vary depending on the choice 
of fi(t), which must be adapted to the real data Pi{t) and assumptions made in 
place of dU) and ©. 

We shall consider fi(t) defined by 

/oo 
log(max{pi(u),p min })</> Q (i - u)du, (8) 
-oo 

where 

(f> a (x) = — =cxp(-a; 2 /(2a 2 )) 
av Z7r 

is a Gaussian with width a, and p m in and a are suitably chosen. (Convolving 
Piit) with <fi a (t) before taking the logarithm also appears to work reasonably 

2 For simplicity we intially consider symmetric confidence intervals, and indicate a suitable 
modification for asymmetric confidence intervals at the end of this section. This may be more 
appropriate if the final estimator has a small asymptotic bias. 
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well, though convolving after taking the logarithm, as here, is slightly easier to 
reason about.) 

Before turning to the verification that such an fi(t) gives good results (nar- 
row confidence intervals), we briefly motivate this choice for fi(t). 

Setting a minimum probability level, p m i n , will greatly reduce sensitivity 
to noise where Pi(t) is small, and hence log(pi(i)) is large in magnitude. This 
change will not affect the asymptotic bias in the idealized case (when ni — pi 
and Ji = 0). Indeed, in place of ^ we have the modified bias formula 

/oo p oo 

Pi m(t)dt=-j2 Pi(t)fimt. (9) 
-oo i J-oo 

When fi(t) = log(max{p 4 (u),p min }), it is easy to check that B(fi(),PiQ) = 0, 
so putting a floor on the value of Pi(t) does not in itself significantly bias the 
estimator. It does increase its variance; this is part of a trade-off between 
variance of the estimator as applicable in ideal conditions, and sensitivity to 
jitter and noise. 

The other modification in (JSJ is convolution with the Gaussian <p a ; for the 
OPERA data this will remove the 200MHz signal if a > 5ns, and can be 
expected to prevent jitter from disproportionately affecting the likelihood if 
q» J. There is an effect on the bias, but it is only of order 0(a 2 ) as a — >• 
since 4> a {t) is symmetric under t -R- —t. 

4.3 Analysis of significance of estimator 

The unknown parameter St appears in , but can easily be eliminated. Indeed, 
the estimator we use is translation invariant, in that 

8t{T x +t,T 2 + t,...,T n + t) = Si(T u T 2 , . . . ,T N ) - r. 

(The minus sign here comes from the following the sign convention for St in pQ.) 
The difference St(T) — St that we arc interested in can thus be written as St(T), 
where % = T % + St = N % + J,. Let 

L(S; T) = L(S; N, J) = £ /*(JV 4 + J t + S), 

i 

m(N,J) = argmax(L((5;N,J)), 

s (10) 

s 1 (w) = inf P(|m(N,j)| < w) 

= inf P(Vje[-J,J] Ar :|m(N,j)K W ). 

In the second last line, the infimum is over all allowed distributions of T — N+ J; 
the final expression follows since we make no assumptions on J other than that 
\Ji\ ^ J for each i, so the worst value j of J can be chosen after seeing N. As 
discussed above, St(T) ± w is a confidence interval for St with confidence level 

8x(w). 



9 



Unfortunately, the space of possible values of the (depending on the noise) 
and the space [— J, J] N of jitter values are high-dimensional. Because the posi- 
tion of the maximum of L(8) can theoretically change a lot given a small change 
in one of the summands fi(Ni + Ji + 5), optimizing over these spaces is likely 
to be computationally intractable. To cope with this, we replace Si(w) with 
a possibly slightly weaker (though in practice very similar) estimate based on 
derivatives. 

To simplify the exposition slightly, we assume a priori that the maximum 
of L(6) will occur in a limited range [— w max , w max \. (In practice, the probabil- 
ity of this assumption failing can be estimated by techniques similar to those 
described below.) We use the observation that, in this case, if the derivative L' 
is positive on [— w max , —w] and negative on [w, w max ], then the maximum over 
[—w max ,w max \ lies in [— w,w]. The point is that we can express the derivative 
of L as a sum of individual derivatives, and bound them separately. Thus we 
set 

£» min (5,N) = infL , ( ( 5;NJ), 
j 

Anax(<5,N) = supL'( ( 5;N,j), 

j / (11) 
s 2 (w) = inf P Vi e [-w m ax, -w] D min (6, N) > and 



V<5 G [w,W max ] Anax(£,N) < 



where the infimum is over all allowed distributions of N, i.e., over all possible 
values of the noise. 

This ensures that whatever the unknown noise and jitter, with probability at 
least S2(w) the function L(S; T) = L(5; N, J) will be increasing on [— w max , —w] 
and decreasing on [w, u>max], and so will have a maximum in [— w,w] that is 
global over [— u; maX) w max ] . This implies that |m(N,J)| < w, assuming that 
L(S; N, J) does not have a rival maximum further away than w max . 

The use of w max is a technicality, and can be removed with a slightly re- 
fined version of this argument, but we retain it for simplicity. (Alternatively, 
it happens that for our approximate reconstruction of the OPERA data, even 
choosing w max as large as 250ns has only a negligible effect on our final confi- 
dence bound (s3(w); see below). But the interval [—250ns, 250ns] is wide enough 
to encompass all plausible values of St.) 

The advantage of calculating a version of s(w) in terms of the derivative of 
L as in (fTTj) , is that the optimization over j becomes tractable: it decomposes 
into separate optimizations over each ji . For example, 



Anin(a,N)=infL'(<5;N,j) 
j 

= y-m£fl(Ni+ji + S). (12) 



There does not seem to be a simple way to handle the final infimum in (llip , 
but we may replace it with a sum of more easily calculated terms while only 
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modestly weakening the confidence bound. From now on we discretize time, by 
rounding all T; to multiples of some small constant, say Ins. The effect of any 
errors this introduces is bounded by increasing J by 0.5ns. All derivatives then 
become simply differences of consecutive values. For 5 G [— w max ,— w] let Fg 
be the "failure" event that J5 m i„((J ) N) ^ 0, and for S e [w, w ma x] let Fg be the 
event that D max ((5, N) ^ 0. Then s 2 (w) as defined above is the infimum of the 
probability that no Fg holds. Hence 



S2 



H^mfjl- Yl W)} 

ii)^|(5|^«> mox 

> l- ]T su pW) = s sH- 



(13) 



'!«^|<5|^'!U ma , x "~ P 



The final supremum is easy to calculate. Suppose, for example, that 5 > 0. Then 
Fg is the event that Sg = £\ gi,g(Ni) ^ 0, where gi,g(x) = sup^^ 7 /• (x + j + 6) 
is a function of x that is easy to calculate from the known data. The assumption 
ni ~ Pi implies that iV, can be coupled with a variable Xi having the known 
density Pi(t) so that F(Xi ^ Ni) ^ e. If we choose the distribution of Ni 
by removing the "most helpful" e- fraction of the distribution of X i7 i.e., the 
part where g i g(X i ) is smallest, and replacing it by probability e of having the 
least helpful value, then the distribution of this individual summand gt y g(Ni) is 
"worse than" (stochastically dominates) any other possible distribution where 
rij ~ Pi- Since Sg is a sum of independent terms, the worst case overall is given 
by combining these individual worst case distributions. (Note that the worst 
case will be different for different values of 8.) 

It remains to estimate the probability that this sum of N independent vari- 
ables has the right sign. This can be done efficiently by the method described in 
Appendix 1, and a short example program implementing this can be found as- 
sociated with this preprint. This could also be done by Monte Carlo simulation, 
though obviously it would be quite slow to estimate very low tail probabilities 
this way. 

A slight modification will improve the confidence bound in the case that there 
is a small bias in the estimator. Instead of summing over S € [— w max , —to] U 
[w, u>max]j w e sum over S e [— w max , — w + b] U [w + b, w max ], on the assumption 
that the modified estimator is 

5t(T) = arg max(Z((5; T)) + b, (14) 
s 

where b = b(w) is determined once and for all by optimizing the resulting 
confidence estimate: 



S4,(w) = 1 — min 

|f>|^«>max- 



/ \ 

Y supP(F 5 ) 

5G[-Wmax,-™+6]U 



Ot [ — '"/max,— W-\-0\U 
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OPERA COMPARISON 1.4 x 10" b 473 



Table 1: Tail confidence values 1 — 54(30), showing the confidence of the es- 
timator being within 30ns of the true value under various different robustness 
scenarios. p' mia is linked to p m i n by IP (AT, < p m j n ) = p' min - The last column shows 
the equivalent number of standard deviations for a Normal distribution: S4(30) 
is equal to the probability that a standard Normal is less than a in magnitude. 
The last row shows the tail confidence figure using the methods of [1] based on 
a 6.9ns statistical error. 

5 Performance of estimator 

In order to properly evaluate the performance of the <5i(T) as defined by (fl"4)) . we 
would need to know the 16111 proton waveforms, Pi(t). These do not appear to 
be publicly available at the time of writing, so instead we use the Pi(t) given by 
[H Fig. 4] and, on the assumption that it is typical, set all pt equal to this one. 
This graph, whose axes are not displayed, was interpreted on the assumption 
that the vertical axis is a linear scale with an offset. The zero point was assumed 
to be at the mean of the left and right tail values, which are assumed to be noise. 
After that, a plausible 200MHz signal was added. 

In addition to the Pi(t), we would need to know what assumptions can be 
made about the various sources of error. Again, these are not described in [TJ, 
so we use the error model described in Section 14.11 with < J ^ 3(ns) and 
e = 0, 10~ 4 or 10" 3 . Then the estimator parameters, a and p m i n , were chosen 
to optimize the confidence for a 30ns error (w = 30). 

In practice, the amount of smoothing chosen was always large enough (a ^ 
22) to filter out almost all of any 200MHz signal. We might hope that the 
majority of the statistical noise would also be eliminated by such smoothing, 
leaving e = 10~ 4 to account for any remaining "malicious" noise. 

The results shown in Table[T]can be compared with the statistical component 
of the error as reported in [TJ. We note that 6.9 (stat.) from [TJ p. 19] means that 
the confidence that the estimator from [1] is within 30ns of the true value would 
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be 1 - $(-30/6.9)/2 = 1 - 1.4 x 1(T 5 . If we assume that J = 2, e = 10~ 4 is 
sufficiently pessimistic, then the corresponding tail confidence using the methods 
of this paper would be 2.7 x 10~ 7 . Alternatively, using these methods, this same 
level of significance is reached for the same confidence bound of 30ns using only 
approximately (4.3/5. 1) 2 = 71% (in fact 74%) of the current requirement of 
N = 16111 observations. If J = 1 is realistic, then 9800 observations (61%) 
would be sufficient for the same confidence level. 

These comparisons obviously rely on the assumption that we are using a 
typical pi, and the assumption that our jitter and noise values are appropriate, 
though we would expect these methods to be useful to at least some extent 
in any case. Note that we are comparing a method with provable robustness 
against (the given type of) jitter and noise with one where the effects of these 
are not quantified. If the method of [1] were adjusted to allow for these effects, 
the difference might be greater, although it is plausible that there is little noise 
effect using the method of [T], since that is the benefit of adding up different p^. 

6 Conclusions and discussion 

We have described what may be a more effective and more robust way of 
analysing the OPERA neutrino time-of-flight data, and in general data from 
similar experimental set-ups. To do this we have presented some statistical 
techniques together with an analysis of their effectiveness. 

It is only possible to properly gauge the effectiveness of these techniques 
when applied to the OPERA data with access to the full experimental data, 
which does not seem to be publicly available at the present time. 

It is possible that there are types of differences between the proton wave- 
forms, Pi(t), and the distribution of neutrino detection events other than the 
noise and jitter we have considered here. For example, in Section 8.1 of [3] it 
is mentioned that there are spurious oscillations with periods 30ns and 60ns in 
some of the Pi(t). These are not mentioned in the more recent OPERA pa- 
per PP, but if they are still present amongst the 16111 OPERA samples, then a 
new evaluation of estimator effectiveness ought to be carried out, and possibly 
a new kind of estimator would be required. It may be possible to use techniques 
similar to those of this paper to do this, though it is not possible to say for sure 
without seeing a more detailed exposition of the nature of these oscillations. 
In [3] it is explained that the spurious oscillations are filtered out with an 8MHz 
low-pass filter; the necessary evaluation of the effect that this may have on the 
estimator is not described there. It may be thought that the Monte Carlo tech- 
niques of [3] and [1] would be sufficient to catch a poor or biased estimator. 
However, these Monte Carlo simulations appear to be based on samples from 
the aggregate (summed) waveform distribution. This is not correct in principle 
and will fail to catch some classes of problems. In addition, such Monte Carlo 
methods lack the ability to simulate worst-case situations, as we have sought to 
do in this paper in the case of noise and jitter. 
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7 Appendix 1: Sums of independent variables 



We outline here the (fairly standard) method we used to bound V(S ^ 0), where 
S is a sum of independent variables A\, . . . ,An with ES ^ 0. The method is 
based on Cramer's exponential tilting trick plus the Berry-Esseen theorem. For 
notational convenience we take the Ai to have the same distribution A; the 
method also works for different distributions by using a single tilting parameter 
9 chosen to achieve ET = J^i = below. 

Let a{x) denote the probability density function of A. By assumption MA = 
J xa(x)dx ^ 0. It follows (unless A is never positive) that there is some 9^0 
such that ILAe eA — 0, i.e., such that J xe 0x a(x)dx = 0. Let B denote the 
distribution with density function 

b(x) = Z" 1 e 9x a(a;), 

where 



is a normalizing constant. Then K[B] =0. Let S denote the sum of N indepen- 
dent variables with distribution A, and T the sum of N independent variables 
with distribution B. A standard calculation shows that their densities are re- 
lated by 

t{x) = Z- N e 6x s{x) and s(x) = Z N e - 6x t(x). 

It follows that 

/•OO 

F(S >0)= I Z N e- 9x t{x)dx. 

Let 



o 



POO 

T(x) =P(T^x)= / t{y)dy. 
Then (integrating by parts) we can rewrite the formula above as 



\S > 0) = Z N T(0) - Z N / 9e- ex T(x)dx. 



o 

Of course, we do not know T(x) in a practical way, but by the Berry-Esseen 
theorem (with a modern form of the constant from [6]), we have 

\T(x) -U(x)\^ V = 0-4785^1^^-1/2 

where U (x) is the probability that a normal distribution with the same mean 
(0 here) and variance as T exceeds x. The worst case is given by taking T(0) — 
U(0) + r) = 1/2 + r) and T{x) = U(x) - r? for x > such that U(x) > 77, and 
T(x) = for larger x. Writing r = 9a, where a 2 is the variance of T, and 
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substituting x — ay in the integral, this gives 



F(S ^ 0) s: Z N (l/2 + r 1 ) - Z N / 6e- ex max{U(x)-r 1 ,0}dx 

Jo 
1*00 

^ Z N (l/2 + r,)-Z N 9e- 9x (U(x) - rj)dx 
Jo 

/>CX> 

= Z N (l/2 + 2r,) - Z N / 9e- 6x U(x)dx 
Jo 
f' 00 

= Z N {l/2 + 2ry) - Z N / re~ TV {\ - $(y))dy 
Jo 

= Z N (e/ 2 /H(-r) + 2f 1 ), 

where $ denotes the distribution function of a standard Gaussian random vari- 
able. 
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8 Appendix 2: Summary of notation 



Pi(t) The i th distribution output by the waveform digitizer, with unspec- 
ified time origin, normalized from Section [3] onwards. 
rii(t) The normalized i th detection-event distribution, where the origin of 
time is that of Pi(t), offset by the timc-of- flight of neutrinos from 
CERN to OPERA, together with all known measurement delays. 
(See Introduction for details.) 
St The (unknown) early arrival time of the neutrinos at OPERA, i.e., 
TOF c - TOF„. 
Si(T) Estimator for St. 

N The number of neutrino detection events. Taken to be 16111. 

U The measured observation times of neutrino detection events, in the 

same timeframe as that of rij(t). 
Tj Arrival times in the statistical model: the U are the actual measured 

values of the TJ. 
T The collection (Ti,...,T N ). 

N A sequence of independent random variables Ni,..., Nn with den- 
sities TLi. 

fi A function, derived from pi, that will play the role of logp; in a 

modified-log- likelihood estimator. 
8 In Section 14.21 S is a candidate value for the unknown St. In Sec- 
tion |43l St has been subtracted out and S = is the true value. 

Ji Small (actual, unknown) "jitter" defined by T,; = Ni — St + Ji. 

J The collection (Ji, Jyv). 

ji Candidate Ji when optimizing over all possible jitter, 
j The collection (j 1 ,...,j N ). 
e Maximum noise considered fSection l4.ip . 
J Maximum jitter considered: |Jj| ^ J (Section I4.1j) . 
Pmin Floor parameter used in defining modified MLE ( Section 14.21) . 

a Smoothing parameter used in defining modified MLE (Section I4.2[) . 
Si(w) (i = 1,2,3,4) Successively weaker, but more tractable/better, ver- 
sions of the confidence bound: F(\St(T) — St\ < w) ^ Si(w). 
$(x) F(X < x) where X - N(0, 1). 

References 

[1] The OPERA collaboration, Measurement of the neutrino velocity with the 
OPERA detector in the CNGS beam, arXiv:1109.4897vl. 

[2] The MINOS collaboration, Measurement of neutrino velocity with the MI- 
NOS detectors and NuMI neutrino beam, Phys. Rev. D 76 (2007), 072005 



6 pages 



16 



[3] G. Brunetti, Neutrino velocity measurement with the OPERA experiment 
in the CNGS beams, PhD thesis, in joint supervision from Universite Claude 
Bernard Lyon- I and Universita di Bologna, 2011. 

http: //operaweb . lngs . inf n. it : 2080/Opera/ptb/theses/theses/Brunetti-Giulia_phdthesis .pdf 

[4] J. Knobloch, Is there a neutrino speed anomaly?, arXiv:1110.0595v2. 

[5] D. Autiero, personal communication, 26th September 2011. 

[6] I. S. Tyurin, Refinement of the upper bounds of the constants in Lyapunov's 
theorem, Russian Mathematical Surveys 65 (2010), 586-588. 



17 



